Sains Malaysiana 54(1)(2025):
279-290
http://doi.org/10.17576/jsm-2025-5401-22
Optimizing
Tuberculosis Treatment Predictions: A Comparative Study of XGBoost with
Hyperparameter in Penang, Malaysia
(Mengoptimumkan Peramalan Rawatan
Tuberkulosis: Suatu Kajian Perbandingan XGBoost dengan Hiperparameter di
Penang, Malaysia)
YANIZA
SHAIRA ZAKARIA1, NUR AFIQAH ARIFFIN2,*, AZIZUL AHMAD3,
RUSLAN RAINIS2, AIDY M. MUSLIM1 & WAN MOHD MUHIYUDDIN
WAN IBRAHIM2
1Institute of Oceanography and Environment
(INOS), Universiti Malaysia Terengganu, 21030 Kuala Nerus, Terengganu, Malaysia
2Geography Section, School of Humanities, Universiti Sains Malaysia
(USM), 11800 Pulau Pinang, Malaysia
3Centre for Spatially Integrated Digital Humanities (CSIDH), Faculty
of Social Sciences & Humanities, Universiti Malaysia Sarawak, 94300 Kota
Samarahan, Sarawak, Malaysia
Diserahkan: 24
April 2024/Diterima: 4 November 2024
Abstract
The
bacterium Mycobacterium tuberculosis causes a viral infection affecting the
lungs and liver. Tuberculosis (TB) is a significant public health concern in
developing countries, where it is often associated with poverty, poor living
conditions, and limited access to healthcare services. According to the World
Health Organization (2023), Tuberculosis continues to pose a substantial risk
to public health on a global scale, with millions of people affected each year
and around 1.5 million deaths in 2020. Healthcare providers often encounter
significant challenges in addressing TB, leading to uncertain treatment
outcomes. This study introduces a novel method for enhancing TB treatment using
sophisticated machine learning techniques, particularly emphasizing the
application of XGBoost and various predictive models in Penang State, Malaysia,
to predict individual treatment outcomes based on clinical data. The models
were trained using 2017 Penang data. Comparing predicted accuracy helps
establish the optimum method. Clinical data was anonymized and analyzed.
Decision tree accuracy is 63.7% using 2017 data. Logistic Regression is 63.3%
accurate, while XGBoost is 66.3%. Hyperparameter-tuned XGBoost performs best at
68.1%. Comparing observed and expected results determines accuracy. TB result
predictions are accurate using supervised learning. Calibrated ensemble models
like XGBoost makes reliable predictions. Additional clinical characteristics
may improve forecasts. The primary objective was to develop a reliable,
clinically validated instrument that enhances TB treatments while optimizing
resource efficiency across diverse healthcare environments.
Keywords: Classification;
hyperparameter; logistic regression; prediction; random forest; tuberculosis
Abstrak
Bakteria
Mycobacterium tuberculosis menyebabkan jangkitan virus yang menjejaskan
paru-paru dan hati. Tuberkulosis (TB) adalah kebimbangan kesihatan awam yang
signifikan di negara-negara membangun dan sering dikaitkan dengan kemiskinan,
keadaan hidup yang buruk dan akses terhad kepada perkhidmatan kesihatan.
Menurut Pertubuhan Kesihatan Sedunia (2023), TB terus menimbulkan risiko yang
besar kepada kesihatan awam di peringkat global dengan berjuta-juta orang
terjejas setiap tahun dan sekitar 1.5 juta kematian pada tahun 2020. Penyediaan
penjagaan kesihatan sering menghadapi cabaran besar dalam menangani TB, yang
membawa kepada hasil rawatan yang tidak menentu. Kajian ini memperkenalkan
kaedah baharu untuk meningkatkan rawatan TB menggunakan teknik pembelajaran
mesin yang canggih dengan penekanan khusus kepada aplikasi XGBoost dan pelbagai
model ramalan di Pulau Pinang, Malaysia untuk meramalkan hasil rawatan individu
berdasarkan data klinikal. Model-model tersebut dilatih menggunakan data Penang
tahun 2017. Membandingkan ketepatan ramalan membantu menetapkan kaedah optimum.
Data klinikal telah dianonimkan dan dianalisis. Ketepatan pokok keputusan
adalah 63.7% menggunakan data 2017. Regresi Logistik adalah tepat 63.3%,
manakala XGBoost adalah 66.3%. XGBoost yang diselaraskan dengan hiperparameter
berprestasi terbaik pada 68.1%. Membandingkan hasil yang diperhatikan dan yang
dijangkakan menentukan ketepatan. Ramalan keputusan TB adalah tepat menggunakan
pembelajaran terawasi. Himpunan model yang dikalibrasi seperti XGBoost
memberikan ramalan yang boleh dipercayai. Ciri klinikal tambahan mungkin dapat
meningkatkan ramalan. Objektif utama adalah untuk membangunkan instrumen yang
boleh dipercayai dan disahkan secara klinikal yang meningkatkan rawatan TB
sambil mengoptimumkan kecekapan sumber pada pelbagai persekitaran penjagaan
kesihatan.
Kata kunci:
Hiperparameter; hutan rawak; pengelasan; ramalan; regresi logistik;
Tuberkulosis
RUJUKAN
Abdullahi, O.A., Ngari, M.M., Sanga, D.,
Katana, G. & Willetts, A. 2019. Mortality during treatment for
tuberculosis; a review of surveillance data in a rural county in Kenya. PLoS
ONE 14(7): e0219191. https://doi.org/10.1371/journal.pone.0219191
Ahmad, A., Kelana, M.H., Soda, R., Jubit,
N., Mohd Ali, A.S., Bismelah, L.H. & Masron, T. 2024a. Mapping the impact:
Property crime trends in Kuching, Sarawak, during and after the COVID-19 period
(2020-2022). Indonesian Journal of Geography 56(1): 127-137.
https://doi.org/10.22146/ijg.90057
Ahmad, A., Masron, T., Jubit, N., Redzuan,
M.S., Soda, R., Bismelah, L.H. & Mohd Ali, A.S. 2024b. Analysis of the
movement distribution pattern of violence crime in Malaysia’s capital
region-Selangor, Kuala Lumpur, and Putrajaya. International Journal of
Geoinformatics 20(2): 11-26. https://doi.org/10.52939/ijg.v20i2.3061
Ahmad, A., Masron, T., Junaini, S.N.,
Barawi, M.H., Redzuan, M.S., Kimura, Y., Jubit, N., Bismelah, L.H. & Mohd
Ali, A.S. 2024c. Criminological insights: A comprehensive spatial analysis of
crime hot spots of property offenses in Malaysia’s urban centers. Forum
Geografi: Indonesian Journal of Spatial and Regional Analysis 38(1):
94-109. https://doi.org/10.23917/forgeo.v38i1.4306
Ahmad, A., Masron, T., Junaini, S.N.,
Kimura, Y., Barawi, M.H., Jubit, N., Redzuan, M.S., Bismelah, L.H. & Mohd
Ali, A.S. 2024d. Mapping the unseen: Dissecting property crime dynamics in
urban Malaysia through spatial analysis. Transactions in GIS 28(6):
1486-1509. https://doi.org/10.1111/tgis.13197
Ahmad, A., Masron, T., Kimura, Y., Barawi,
M.H., Jubit, N., Junaini, S.N., Redzuan, M.S., Mohd Ali, A.S. & Bismelah,
L.H. 2024e. Unveiling urban violence crime in the state of The Selangor, Kuala
Lumpur and Putrajaya: A spatial–temporal investigation of violence crime in
Malaysia’s key cities. Cogent Social Sciences 10(1): 2347411.
https://doi.org/10.1080/23311886.2024.2347411
Ahmad, A., Masron, T., Mohd Ali, A.S.,
Barawi, M.H., Nordin, Z.S., Abg Ahmad, A.I., Redzuan, M.S. & Bismelah, L.H.
2024f. Exploring the potential of geographic information system (GIS)
application for understanding spatial distribution of violent crime related to
United Nations sustainable development goals-16 (SDGS-16). Journal of
Sustainability Science and Management 19(9): 35-63.
https://doi.org/10.46754/jssm.2024.09.003
Ahmad, A., Masron, T., Mohd Ali, A.S.,
Kimura, Y. & Junaini, S.N. 2024g. Demographic dynamics and urban property
crime: A linear regression analysis in Kuala Lumpur and Putrajaya (2015-2020). Planning
Malaysia: Journal of the Malaysian Institute of Planners 22(4): 302-319.
https://doi.org/10.21837/pm.v22i33.1550
Ahmad, A., Masron, T., Ringkai, E., Barawi,
M.H., Salleh, M.S., Jubit, N. & Redzuan, M.S. 2024h. Analisis ruangan hot
spot jenayah pecah rumah di negeri Selangor, Kuala Lumpur dan Putrajaya
pada tahun 2015-2020. Geografia-Malaysian Journal of Society and Space 20(1): 49-67. https://doi.org/10.17576/geo-2024-2001-04
Ali, A., Alrubei, M.A.T., Hassan, L.F.M.,
Al-Ja’afari, M.A.M. & Abdulwahed, S.H. 2020. Diabetes diagnosis based on
KNN. IIUM Engineering Journal 21(1): 175-181.
https://doi.org/10.31436/iiumej.v21i1.1206
Ariffin, N.A., Wan Ibrahim, W.M.M., Rainis,
R., Samat, N., Mohd Nasir, M.I., Abdul Rashid, S.M.R., Ahmad, A. & Zakaria,
Y.S. 2024. Identification of trends, direction of distribution and spatial
pattern of tuberculosis disease (2015-2017) in Penang. Geografia-Malaysian
Journal of Society and Space 20(1): 68-84. https://doi.org/10.17576/geo-2024-2001-05
Bismelah, L.H., Masron, T., Ahmad, A., Mohd
Ali, A.S. & Echoh, D.U. 2024. Geospatial assessment of healthcare
distribution and population density in Sri Aman, Sarawak, Malaysia. Geografia-Malaysian
Journal of Society and Space 20(3): 51-67.
https://doi.org/10.17576/geo-2024-2003-04
Bukundi, E.M., Mhimbira, F., Kishimba, R.,
Kondo, Z. & Moshiro, C. 2021. Mortality and associated factors among adult
patients on tuberculosis treatment in Tanzania: A retrospective cohort study. Journal
of Clinical Tuberculosis and Other Mycobacterial Diseases 24: 100263.
https://doi.org/10.1016/j.jctube.2021.100263
Chabo, D., Masron, T., Jubit, N. &
Ahmad, A. 2024. Analisis corak ruangan keciciran murid sekolah menengah di
Sarawak. Malaysian Journal of Social Sciences and Humanities 9(9):
e002906. https://doi.org/10.47405/mjssh.v9i9.2906
Dheda, K., Perumal, T., Moultrie, H.,
Perumal, R., Esmail, A., Scott, A.J., Udwadia, Z., Chang, K.C., Peter, J.,
Pooran, A., von Delft, A., von Delft, D., Martinson, N., Loveday, M.,
Charalambous, S., Kachingwe, E., Jassat, W., Cohen, C., Tempia, S., Fennelly,
K. & Pai, M. 2022. The intersecting pandemics of tuberculosis and COVID-19:
Population-level and patient-level impact, clinical presentation, and corrective
interventions. The Lancet Respiratory Medicine 10(6): 603-622.
https://doi.org/10.1016/S2213-2600(22)00092-3
Fayaz, S.A., Babu, L., Paridayal, L.,
Vasantha, M., Paramasivam, P., Sundarakumar, K. & Ponnuraja, C. 2024.
Machine learning algorithms to predict treatment success for patients with
pulmonary tuberculosis. PLoS ONE 19(10): e0309151–e0309151.
https://doi.org/10.1371/journal.pone.0309151
Gichuhi, H.W., Magumba, M., Kumar, M. &
Mayega, R.W. 2023. A Machine Learning approach to explore individual risk
factors for tuberculosis treatment non-adherence in Mukono district. PLOS
Glob Public Health 3(7): e0001466.
https://doi.org/10.1371/journal.pgph.0001466
Gill, C.M., Dolan, L., Piggott, L.M. &
McLaughlin, A.M. 2022. New Developments in Tuberculosis Diagnosis and
Treatment. Breathe, 18(1): 210149.
https://doi.org/10.1183/20734735.0149-2021
Hrizi, O., Gasmi, K., Ben Ltaifa, I.,
Alshammari, H., Karamti, H., Krichen, M., Ben Ammar, L. & Mahmood, M.A.
2022. Tuberculosis disease diagnosis based on an optimized Machine Learning
model. Journal of Healthcare Engineering 2022: 8950243.
https://doi.org/10.1155/2022/8950243
Hussain, O. A., & Junejo, K.N. 2018.
Predicting treatment outcome of drug-susceptible tuberculosis patients using
machine-learning models. Informatics for Health and Social Care 44(2):
135–151. https://doi.org/10.1080/17538157.2018.1433676
Janssens, R.J., Mourão-Miranda, J. &
Schnack, H.G. 2018. Making individual prognoses in psychiatry using
neuroimaging and Machine Learning. Biological Psychiatry: Cognitive
Neuroscience and Neuroimaging 3(9): 798-808.
https://doi.org/10.1016/j.bpsc.2018.04.004
Jubit, N., Masron, T., Ahmad, A. & Soda,
R. 2024a. Investigating the spatial relation between landuse and property crime
in Kuching, Sarawak through location quotient analysis. Forum Geografi:
Indonesian Journal of Spatial and Regional Analysis 38(2): 153-166.
https://doi.org/10.23917/forgeo.v38i2.4575
Jubit, N., Masron, T., Redzuan, M.S.,
Ahmad, A. & Kimura, Y. 2024b. Revealing adolescent drug trafficking and
addiction: Exploring school disciplinary and drug issues in the Federal
Territory of Kuala Lumpur and Selangor, Malaysia. International Journal of
Geoinformatics 20(6): 1-12. https://doi.org/10.52939/ijg.v20i6.3327
Jubit, N., Masron, T., Puyok, A. &
Ahmad, A. 2023. Geographic distribution of voter turnout, ethnic turnout and
vote choices in Johor state election. Geografia-Malaysian Journal of Society
and Space 19(4): 64-76. https://doi.org/10.17576/geo-2023-1904-05
Kouchaki, S., Yang, Y., Walker, T.M., Sarah
Walker, A., Wilson, D.J., Peto, T.E.A., Crook, D.W., CRyPTIC Consortium &
Clifton, D.A. 2019. Application of Machine Learning techniques to tuberculosis
drug resistance analysis. Bioinformatics 35(13): 2276-2282.
https://doi.org/10.1093/bioinformatics/bty949
Lopez-Garnier, S., Sheen, P. & Zimic,
M. 2019. Automatic diagnostics of tuberculosis using convolutional neural
networks analysis of MODS digital images. PLoS ONE 14(2): e0212094.
https://doi.org/10.1371/journal.pone.0212094
Marzuki, A., Bagheri, M., Ahmad, A.,
Masron, T. & Akhir, M.F. 2024. Examining transformations in coastal city
landscapes: Spatial patch analysis of sustainable tourism - A case study in
Pahang, Malaysia. Landscape and Ecological Engineering 20: 513-545.
https://doi.org/10.1007/s11355-024-00613-w
Marzuki, A., Bagheri, M., Ahmad, A.,
Masron, T. & Akhir, M.F. 2023. Establishing a GIS-SMCDA model of
sustainable eco-tourism development in Pahang, Malaysia. Episodes 46(3):
375-387. https://doi.org/10.18814/epiiugs/2022/022037
Masron, T., Ahmad, A., Jubit, N., Sulaiman,
M.H., Rainis, R., Redzuan, M.S., Junaini, S.N., Jamian, M.A.H., Mohd Ali, A.S.,
Salleh, M.S., Zaini, F., Soda, R. & Kimura, Y. 2024. Crime Map Book.
Centre for Spatially Integrated Digital Humanities (CSIDH), Faculty of Social
Sciences and Humanities, Universiti Malaysia Sarawak.
https://www.researchgate.net/publication/384572873_Crime_Map_Book
Miotto, R., Li, L., Kidd, B.A. &
Dudley, J.T. 2016. Deep patient: An unsupervised representation to predict
patients’ future from the electronic health records. Scientific Reports 6: 26094. https://doi.org/10.1038/srep26094
Nicholson, T.J., Hoddinott, G., Seddon,
J.A., Claassens, M.M., van der Zalm, M.M., Lopez, E., Bock, P., Caldwell, J.,
Da Costa, D., de Vaal, C., Dunbar, R., Du Preez, K., Hesseling, A.C., Joseph,
K., Kriel, E., Loveday, M., Marx, F.M., Meehan, S.A., Purchase, S., Naidoo, K.,
Naidoo, L., Solomon-Da, C.F., Sloot, R., Osman, M. 2023b. A systematic review
of risk factors for mortality among tuberculosis patients in South Africa. A
Systematic Review 12(1): 23. https://doi.org/10.1186/s13643-023-02175-8
Pedregosa, F., Varoquaux, G., Gramfort, A.,
Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R.,
Dubourg, V., Vanderplas, J., Cournapeau, D., Brucher, M., Perrrot, M. &
Duchesnay, E. 2011. Scikit-learn: Machine Learning in Python. Journal
of Machine Learning Research 12: 2825-2830. https://dl.acm.org/doi/10.5555/1953048.2078195
Takarinda, K.C., Sandy, C., Masuka, N.,
Hazangwe, P., Choto, R.C., Mutasa-Apollo, T., Nkomo, B., Sibanda, E.,
Mugurungi, O., Harries, A.D. & Siziba, N. 2017. Factors associated with
mortality among patients on TB treatment in the Southern Region of Zimbabwe,
2013. Tuberculosis Research and Treatment 2017: 6232071.
https://doi.org/10.1155/2017/6232071
Tiwari, A. & Maji, S. 2019. Machine
Learning techniques for tuberculosis prediction. International Conference on
Advances in Engineering Science Management & Technology (ICAESMT) - 2019,
Uttaranchal University, Dehradun, India. https://ssrn.com/abstract=3404486
or http://dx.doi.org/10.2139/ssrn.3404486
World Health Organization. 2023. Tuberculosis.
World Health Organization.
https://www.who.int/news-room/fact-sheets/detail/tuberculosis
World Health Organisation. 2022. Global
Tuberculosis Report 2022.
https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2022
Xie, Y., Han, J., Yu, W., Wu, J., Li, X.
& Chen, H. 2020. Survival analysis of risk factors for mortality in a
cohort of patients with tuberculosis. Canadian Respiratory Journal 2020:
1654653. https://doi.org/10.1155/2020/1654653
Xiong, Y., Ba, X., Hou, A., Zhang, K.,
Chen, L. & Li, T. 2018. Automatic detection of mycobacterium tuberculosis
using artificial intelligence. Journal of Thoracic Disease 10(3):
1936–1940. https://doi.org/10.21037/jtd.2018.01.91
Yang, S., Zhu, F., Ling, X., Liu, Q. &
Zhao, P. 2021. Intelligent health care: Applications of deep learning in
computational medicine. Frontiers in Genetics https://doi.org/10.3389/fgene.2021.607471
Zakaria, Y.S., Ahmad, A., Said, M.Z., Epa,
A.E., Ariffin, N.A., M Muslim, A., Akhir, M.F. & Hussin, R. 2023. GIS and
oil spill tracking model in forecasting potential oil spill-affected areas
along Terengganu and Pahang coastal area. Planning Malaysia: Journal of the
Malaysian Institute of Planners 21(4): 250-264.
https://doi.org/10.21837/pm.v21i28.1330
*Pengarang untuk surat-menyurat;
email: zarika27@gmail.com
|